Mercury BLASTN: Faster DNA Sequence Comparison using a Streaming Hardware Architecture
نویسندگان
چکیده
Motivation: Large-scale DNA sequence comparison, as implemented by BLAST and related algorithms, is one of the pillars of modern genomic analysis. One way to accelerate these computations is with a streaming architecture, in which processors are arranged in a pipeline that replicates the multistage structure of the algorithm. To achieve high performance, the processor hardware implementing the critical seed matching and ungapped extension stages of BLAST should be specialized to execute these stages as quickly as possible. However, accelerating these stages requires solving two key problems: first, the seed matching stage is not of a form which has traditionally been amenable to hardware acceleration; and second, the accelerated implementation of BLAST should retain sensitivity at least comparable to that of the original software. Results: We describe Mercury BLASTN, an FPGA-based implementation of BLAST for DNA. Mercury BLASTN combines a Bloom filtering approach to seed matching with a modified ungapped extension algorithm to overcome barriers to placing the early stages of BLAST onto hardware. On a previous-generation FPGA hardware platform, Mercury BLASTN runs 5 to 11 times faster than NCBI BLASTN current-generation general-purpose CPUs, with the prospect of a further eight-fold speedup on current-generation FPGAs. Moreover, its sensitivity to significant DNA sequence alignments is 99% of that observed with software NCBI BLASTN. Availability: Academic users should contact the authors for information on acquiring a prototype of the Mercury BLASTN system. Contact: [email protected]
منابع مشابه
Implementation of Word Matching Stage of BLASTN Using Modified Bloom Filter
Basic Local Alignment Search Tool (BLAST) is a standard computer application that molecular biologists use to search for sequence similarity in genomic databases. BLASTN, a version of BLAST specifically designed for DNA sequence searches i.e., it will find the similarities between the query sequence and the subject sequence. This similarity is to understand the function and evolutionary history...
متن کاملNCBI BLASTN Stage 1 in Reconfigurable Hardware
Recent advances in DNA sequencing have resulted in several terabytes of DNA sequences. These sequences themselves are not informative. Biologists usually perform comparative analysis of DNA queries against these large terabyte databases for the purpose of developing hypotheses pertaining to function and relation. This is typically done using software on a general multiprocessor. However, these ...
متن کاملHigh speed BLASTN: an accelerated MegaBLAST search tool
Sequence alignment is a long standing problem in bioinformatics. The Basic Local Alignment Search Tool (BLAST) is one of the most popular and fundamental alignment tools. The explosive growth of biological sequences calls for speedup of sequence alignment tools such as BLAST. To this end, we develop high speed BLASTN (HS-BLASTN), a parallel and fast nucleotide database search tool that accelera...
متن کاملDesign and Evaluation of a BLAST Ungapped Extension Accelerator, Master's Thesis
The amount of biosequence data being produced each year is growing exponentially. Extracting useful information from this massive amount of data is becoming an increasingly difficult task. This thesis focuses on accelerating the most widely-used software tool for analyzing genomic data, BLAST. This thesis presents Mercury BLAST, a novel method for accelerating searches through massive DNA datab...
متن کاملOptimal Spaced Seeds for Homologous Coding Regions
Optimal spaced seeds were developed as a method to increase sensitivity of local alignment programs similar to BLASTN. Such seeds have been used before in the program PatternHunter, and have given improved sensitivity and running time relative to BLASTN in genome-genome comparison. We study the problem of computing optimal spaced seeds for detecting homologous coding regions in unannotated geno...
متن کامل